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Abstract — We study the problem of multiple hypothesis testing 
(HT) in view of a rejection option. That model of HT has 
many different applications. Errors in testing of M hypotheses 
regarding the source distribution with an option of rejecting 
all those hypotheses are considered. The source is discrete and 
arbitrarily varying (AVS). The tradeoffs among error probability 
exponents/reliabilities associated with false acceptance of rejection 
decision and false rejection of true distribution are investigated and 
the optimal decision strategies are outlined. The main result is 
specialized for discrete memoryless sources (DMS) and studied 
further. An interesting insight that the analysis implies is the phe- 
nomenon (comprehensible in terms of supervised/unsupervised 
learning) that in optimal discrimination within M hypothetical 
distributions one permits always lower error than in deciding to 
decline the set of hypotheses. Geometric interpretations of the 
optimal decision schemes are given for the current and known 
bounds in multi-HT for AVS's. 

I. Introduction 

Recent impetuous progress in computer and public network 
infrastructure as well as in multimedia data manipulating 
software created an unprecedented yet often uncontrolled 
possibilities for multimedia content modification and redistri- 
bution over various public services and networks including 
Flickr and YouTube. Since in multiple cases these actions 
concern privacy sensitive data, a significant research effort was 
made targeting efficient means of their identification as well 
as related performance analysis iflOl . iTTTI . Ifl6l . While early 
reported results lfl2l were mostly dedicated to the capacity 
analysis of identification systems, more recent considerations 
are based on multiple HT framework with a rejection option. 
Possible examples for binary data statistics are presented in 
lfl9l and EH . Motivated by the prior art, we extend the 
problem of content identification as multiple HT with rejection 
to a broader class of source priors including AVS's. Our 
analysis lies within the frames of the works by Hoeffding JT], 
Csiszar and Longo [2|, Blahut Q, Haroutunian [6|, Birge 0, 
Fu and Shen 0, Tuncel [13], Grigoryan and Harutyunyan [20 1 
with the aim of specifying the asymptotic bounds for error 
probabilities. Those papers do not treat an option of rejection. 
In particular, characterizes the optimum relation between 
two error exponents in binary HT and [6 | (see also [14 |, [18]) 
and [13| study the multiple (M > 2) HT for DMS's in terms 
of logarithmically asymptotically optimality (LAO) and errors 
exponents achievability, respectively. Later advances in the 
binary and M-ary HT for a more general class of sources 
- AVS's (see also its coding framework 03]), are the subjects 



of |9l and EDI , respectively. The latter derives also Chernoff 
bounds for HT on AVS's and extends the finding by Leang and 
Johnson (SI for DMS's. Our work is a further extension of M- 
ary HT for discrete sources in terms of errors occurring with 
respect to an additional rejection decision. The focus is on the 
attainable region of error exponents which tradeoff between 
the false acceptance of rejection decision and false rejection 
of true distribution. A similar model of HT with empirically 
observed statistics for Markov sources has been explored by 
Gutman in 0. Compared to [5] we make a new look into 
the compromises among error events. We still assume that the 
observations upon which the decision making is performed are 
available from the source without noise. A further expansion of 
this subject could restrict the decision making within corrupted 
source samples. 

II. Models of source and HT 

Let X and S be finite sets: the alphabet of an information 
source and its states, respectively. Let V(X) be the set of 
all probability distributions (PD) on X. The source in our 
focus is defined by the following family of conditional PD's 
G* depending on arbitrarily and not probabilistically varying 
source state s£5: 



g* = {g; 



(1) 



A 



with G* = {G*(x\s), x £ X}. An output source vector 



A 



x = (x\, ...,xn) £ X will have the following probability if 
dictated by a state vector s e S N : G* N (x\s) = G*(x) = 
11^=1 G*(x n \s n ). Furthermore, the probability of a subset 
An C X n subject to s G S N is measured by the sum 

G 



A 



A 



(A N \s) = G* s (A N ) = £ G s *(x). 

xe^tiv 

Our model of HT is determined by M - 
the source distribution ([T): 



1 hypotheses about 



g* = Gr, 



Hr : none of H m 's is true 



with 



^ {Gr, 



(2) 



0|s), x £ X}, s £ S, m = 1,M. 



where G„ 

Let G m be the stochastic matrix defined by (0. Based on N 
observations of the source one should make a decision in favor 
of one of those hypotheses. Typically it can be performed by 
a decision maker/detector applying a test tp N as a partition of 



X N into M + 1 disjoint subsets A%, m = l,M and A§. If 
x G ^l^v men me test adopts the hypothesis H m . If x G A^, 
the test rejects all the hypotheses H m , m — 1,M, The test 
design aims at achieving certain levels of errors during the 
process of decision making. (M + 1)M different kinds of 
errors, denoted by ai, m {<PN) and a R , m (ipN), I ^ Tn = 1,M, 
are possible. The probability of an erroneous acceptance of 
the hypothesis Hi when H m was true is 



A 



a;, m (^jv) = maxG m (A N \s), 1 < I ^ m < M. 



(3) 



And the error probability of false rejection when H m was true 
is defined by 



ctR. m {VN) = max G^(.A^|s), 
ses N 



1,M. 



(4) 



Another type of error can be observed related to wrong 
decision in case of true H m with the probability 

A 



a m {tfN) 



maxC^^s) 



M 

E 



ai, m (tpN) + a Rim (tp N ), m = l, M.(5) 



So we study the following error probability exponents/reliabi- 
lities (log-s and exp-s being to the base 2) by (O and ©: 

1 



-E7|m(v)=lim sup- 

, A 



lo g a ifm(Viv), l + m=l, M, (6) 



N 



E R . m (tp)=limsuv- — log a R (<p N ), m=l,M, (7) 



N- 



N 



where ip = {y?jv}^ = i. From (O and © it follows that 

E m ((p) = min [Eii m ((p),E Rm ((pj\ . 

In view of achievability concept |[T3l for reliabilities in 
M-ary HT, consider the M(M + 1) -dimensional point E = 
{E R . m , E m } m= Yjj w ^ tn res P ect to the error exponents pairs 
(-jjloga Rim (<p N ), -~l oga m ((p N )), where the decision 
regions A% (m = 1,M) and A% satisfy Aj} n A l N = 9 
for m^l,A%nA% = $ and (J ^ = X N /A% 

rn 

Definition 1. E is called achievable if for all e > there exists 
a decision scheme {A r f}}m=i an d -^n w i m me properties 

-j;loga Rim ((p N ) > E R , m -e, -jjloga m {<PN) > E m -e 

for N large enough. Let 7^avs (M, R) denotes the set of all 
achievable reliabilities. 

III. Basic Properties 

Here we resume some necessary material on the typical 

sequences 0. Let V{S) = {P(s), s G S} be the collection 
of all PD's on S and let PG be a marginal PD on X defined 

by PG(x) = £ P{s)G{x\s), x G X. 

sGS 

The type of the vector s £ S N is the empirical PD P s (s) = 
jtN(s\s), where iV(s|s) is the number of occurrences of s in 



(8) 



s. Let's denote the set of all types of A^-length state vectors 
by T' N (S). For a pair of sequences x £ X N and s G S N 
let N(x, s|x, s) be the number of occurrences of (x,s) in 
{x n , s n }n=i- The conditional type G XiS of the vector x with 
respect to the vector s is defined by 

G x , s {x\s) = N(x,s\yL,s)/N(s\s), xeX,seS. (9) 

The joint type of vectors x and s is the PD P s o G xs = 
{P s (s)G x . s (a;|s), x £ X, s £ 5}. For brevity the type nota- 
tions can be used without indices. Let Q N (X\S) be the set of 
all conditional types (0 and Q(X) be the set of all distributions 
defined on X. Denote by Tq (X\s) the set of vectors x which 
have the conditional type G for given s having type P. Let 
the conditional entropy of G given type P be H(G\P). The 
notation H(Q) will stand for the unconditional entropy of 
Q £ V{X). Denote by D{G || G m \P) the KL divergence 
between G and G m given type P and by D(PG || PG m ) 
the one between marginals PG and PG m . The following 
inequality holds for every G m G Q m : 



D(G || G m |P)>£>(PG || PG m 
We need the next properties: 

\G N (X\S)\ < (AT + l)!*" 5 !, 
\T<?(X\s)\<exp{NH(G\P)}. 



(10) 

(11) 
(12) 



For a PD G m € <?(A?|S) the sequence x £ Tg(X\s) has the 
probability 

G^(x|s) = exv{-N{H(G\P) + D(G \\ G m \P)}}. (13) 

O and ( fT3l give an estimate for conditional type class 
probability 

G%{Tg (A»|s) > (TV + 1)^" 5 I exp{-/VD(G || G m |P)}, 

(14) 

G^(r G w (X|s)|s) < cxp{-iVr,(G || G m |P)}. (15) 
IV. Region of Acheivable Reliabilities 



Introduce the following convex hulls for each m = 1, M 

(16) 



W ro = {W m (a;) = V A s G m , s (x|s)}, 



where .t G A", < A s < 1, ^ A s = 1, and the region 

s£S 



A 



£ AVS (M,R) = {E : VW 3 m (m = 1,M), s. t. 

min || W m ) > E m and 3W s. t. 

min £>(W || V^ m )>P J? m for all m}.(17) 

Our main result shows that (fTTI i completely characterizes 
n AVS (M,R). 

Theorem 1: £avs(M,R) is an achievable region of reli- 
abilities £ AVS (M,R) C TZ AYS (M,R). Moreover, if E G 
n AYS (M,R), then for any 5 > 0, Eg G S A y S (M,R), where 

E«j = {Pi?.,,,, - <5, P m — S} m= Y~M- 



Proof: For the direct part, if E G £avs(M, R), then 
from COli, <H2), O and © for any type G G ^(A^S) 
and s G S w with type P s = P we have 

G£ a (Z» = £ G£,(x|s) 

< ^ exp{-ND(G || G m , s |P)} 
r^(x|s)c^ 

< \G N (X\S)\exp{-ND(PG \\ PG m , s )}. (18) 

For every W m G W m there exists s G 5 , such that W ra = 
PsG m ,s- Hence, from ( fT8l and (fTTT i we come to 

a m (^jv) < |a Ar (^|5)|exp{-JVmin£>(W || W rn )} 

w m 

< \G N (X\S)\exp{-NE m } 

< exp{-N(E m -5)}. 

In the same way we could get the necessary inequality for 
a Rm {^N), that is 

a R , m (<p N ) < exp{-N(E R>m - 5)}. (19) 

This closes the proof of the direct part. 

For the converse we assume that E G lZ\vs(M, R). This 
provides that for every e > there exists a decision scheme 
{A™ , A?j-}m = -L that makes the following inequalities true as 
soon as N > N (e): 

--77^oga Rym (ip N ) > E R . m -e, -—loga m (cp N ) > E m -e, 

(20) 

for all m's. Pick a 5 > and show that 

VW 3m s. t. min D(W || W m ) > E m - 5, (21) 

w m ew m 

3 W s. t. ^min D(W \\ W m ) > E R>n - 5 for all m. (22) 

For that we prove the next fact. For every W m G W m and 
An Q X the inequality holds: 



Therefore 



W£(A N ) < mxxGZ(A N \s). 
ses N 



(23) 



To show d23l , first note that for W m G W TO there exists a 
collection of A s 's (by ([Tol l) s.t. W m = Y2 ^sG m s . Whence, 



s£S 



A 



N 



for A s = Y[ ^s n and any „4jv G X , x G An, the following 

71=1 

estimate implies 



JV 



n=l 

AT 

AT 

= y ' A s J^J G m (x„|s„) 



W™(^at) < maxG£(.Ajv|s) 



for every G W£ and i w C A 1 ". Turning to (ED, by 
the continuity of D(- \\ W m ) there exists a type Q G V N (X) 
that for TV > iVi(e) and a fixed m satisfies 



D(Q || W m ) < D(W || W„ 



5/2. 



(24) 



A 



Let W m = arg min D(Q W m ) > E m - 5/2, then in 
light of ( 123b and (fl2l we have 

> F|I(^n^(i)) 

2 ex P {-iV[P(Q) 
3fnr«(x) 

|| T^ m )]} 

> |7™nr^(X)|ex P {-7Vi/(Q)}x 
x exp{-iV^(Q || W m )}. 

Note that n Tg(X)\ exp{-NH(Q)} > cxp{-N6/4} 
for N > N^S). It follows from the inequality |34^ n 
which implies that 



T Q N (X)\ > JS^Wl 

|2f n T£(X)\exp{-NH(Q)} 

> \T^(X)\exp{-NH(Q)}exp{-N 



logM 
N 



> exp{-iV5/4}. 
Whence, for TV > max{iVi(<5), ^(5)} we have 

Omfajv) > exp{-./V[£>(Q || W 7 ™) -5/4}} 
> exp{-N[D(W\\W m ) + 5/4}} 

that with (l20l and e = 3(5/4 gives 
1 



} 

(25) 



5 < 



N 



loga m {<p N ) <D(W || W r . 



for TV > max{N (e),Ni(5), N 2 (5)} and for every m=l,M. 



Now we have to proceed with the proof of 



Sup- 



s£S N n=l 
N 

< max TT G m (x n \s n ) 

n—l 

maxG^(x|s). 
~ ses N 



pose again W m = arg min D(Q \\ W m ) > E m — 

W m eW m 

5/2. For a picked 5 > 0, if E 5 £ S AYS (M,R) then 

VW3m satisfying D(W \\ W m ) < E Rm - 5. 

According to d23]l, dT2j, d24]l and (f25]l we have 

jv R 

Ct Rm (<Pn) > W m {A N ) 

> W m (A N nTg(x)) 

J] exp{-N[H(Q) 

X«nr«(x) 

+D(Q \\W m )]} 

> \A N n Tg(X)\ exp{-NH(Q)} x 
x exp{-ND(Q || W m )} 

> exp{-N[D{W \\W m )-5/4]} 

> exp{-N[E Rm -5/4}}. 



However the last inequality is in conflict with (|20T > for e < 
5/4 and TV > max{N (e), Nx(6), N 2 {6)}. ■ 

V. Optimal decision schemes 

Here we look for optimal decision schemes and the corre- 
sponding best error exponents in the following sense (similar 
to LAO test 0, 04)). Let E m , m = 1,M, be fixed: 
what are the "maximum" values for {Ef m ,E Rm } l , m _Yj^ 
such that there is no other {^ m ,4,m}|^=Tl satisfying 

K,n > K m and E km > E *R,n Oil + TTl = 1, Ml 

Consider the following test sequence (p* in terms of the sets 

Br = {W : min D(W II W m ) > E m for all m\, 

w m eW m 

A 



B m = {W: min D(W \\ W m ) < E m }, m = l,M. 
w m ew m 



Define (l^m= 1,M): 

E R , m (v*) = E* R , m = min min D(W \\ W m ), (26) 
weBnW m ew m 

E Lm (<p*) = E? m = min min D(W \\ W m ). (27) 
' ' weBi w m ew m 



Theorem 2: Let the following inequalities hold: 
E* < min{ min D(W m \\ Wt)}, 

m ff„£W m ,ffi6Wi 

E m < min{ min Ei m , min min D(Wi || W m )}, 

then there exist optimal sequence of tests and the correspond- 
ing optimal vector of reliabilities are defined as in (l26l -(l27l>. 

Proof: Let the decision on R or an m be made 
based on the partition: V m = \JweB m 7~w{X), V R 
UweB R TffW Note that p m n Th ± and 2? m n V R ? 



A 



m^l= 1,M. 

For = arg min ZXVF II W m ), m = 1, M, and (z> = 
{^at}?/=i perform (applying unconditional verion of (IT4l ) 



M > W m {V R ) 

> W N m { |J T#(X)) 



weB R 

> max exp{-AT[£>(W || W m ) + o N (l)}} 

W€Br 

= exp{-7V[ min D(W \\ W m ) + o N (l)}}. 

W£B R 

In a similar way we can obtain the inequality 

ai, m (<p N ) > exp{-A/[ min min D(W \\ W m )+o N (l)}}. 

w m ew m wev t 

(28) 

The proof of the converse inequalities 



otR, m {<PN) < exp{-7V[ min min D(W \\ W m ) + o N (l)}} 

(29) 

OLi, m {(p N ) < exp{-iV[ min min D(W \\ W m ) +o N {l)]} 
w m ew m weVi 

(30) 

are omitted here because of space restrictions. 



Taking into account d28l >. (I29t . (I30t and the continuity 
of the functional D(W \\ W m ) we obtain that the limit 

lim {sup— iV _1 \ogaf m {ip* N )} exists and equals to E* m . 



N 



The proof will be accomplished if we demonstrate that ip* 
is optimal. Let ip' be a test defined by the sets (25^,2?^) s.t. 



E' l<m > Ef lm , E' R;m > E* R[m , l^m=l,M. 
It yields for N large enough that 

Below we examine the relation between (2? m ,2?fl) and 
{T>' rn ^'D' R ). Four cases are possible: 
1 \)V m C\V' m = %, 

2) V m C V' m , 

3) V' m C V m , 

4) v m n v' m f 0. 

The same cases exist also for V R and V' R . 

Consider T> m n 2?^ = case. It follows that there exists 
I / m such that V rn C\V\^%. That is 3W such that || 
W m ) < E* m , so T#{X) C 15,'. Compute 

a fm(v'iv) = maxG™^'^) 

s£S N 

> exp{~N[D(W || W m ) + ojv(1)]} 
= exp{~N[D(W\\W m ) + o N (l)}} 
= exp{-N[E m + o N (l)}}. 

Thus E[ < E' rn = E* which contradicts to ©. ■ 



VI. Geometric interpretations 



Fig. 1 presents a geometric interpretation for the decision 
scheme in Theorem Q] Relevantly, Fig. 2 and 3 illustrate the 
geometry of the Chernoff bounds derived in 11201 for the 
multi-HT where the rejection is not an alternative (c.f. ifPTl 
for DMS's). Those interpretations are comprehensible with 
conceptual details given in |20l . 

VII. Results for DMS 

With assumption of S = 1 we get the model of multi-HT 
with rejection for DMS: 



A 



G m , H R : none of H m 's is true, 



with G m = {G m (x), x S X}, m = 1, M. The problem here 
is to make a decision regarding the generic G* among M 
alternative PD's G m , i 



1, M, and the rejection. Let 



A 



£{M,R)= {E : VQ3m(m = l,M), s.t. 

D(Q || G m ) > E m and 3Qs.t. 
D{Q || G m ) > E R>m for all m}. 

Theorem 3: Theorem Q] implies that £(M,R) C TZ(M,R). 
Conversely, if E 6 TZ(M, R), then for any 6 > 0, E s G 
£(M, R), where E 5 = {E R ^ rn - 8, E m - 5} m=T ji- 



To formulate the DMS counterpart of Theorem |2] define the 

sets: 

B R (DMS) = {Q : D(Q \\ G rn ) > E m , for allm = TJJ}, 

B m (DMS) = {Q : D(Q \\ G m ) < E m }, m = TJd. 
Furthermore 

E* Rim = min D(Q || G m ), m = Tj4, (31) 

QeBj?(DMs) 

E* lm = min D{Q \\ G m ), l^m = TJd- (32) 

QeBt (dms) 




Fig. 1: Multiple HT with rejection. 




Fig. 2: Chernoff bounds: binary HT: AVS. 




Fig. 3: Chemoff bounds: multiple HT: AVS. 



Theorem 4: If D(G m \\ G t ) > 0, m ^ I = 1, M, and 
E{ < mm{D(G m || d)}, 

m 

E* m < min{ min E Lmi min D(Gi \\ G m )}, 

l^m l=m+l,M 



then there exist optimal tests and the corresponding optimal 
vector of reliabilities are defined according to (l3~Tl - (l3"2"V 

According to |22| the authors claim to have obtained 
Theorem [4] independently. 

Remark 1: It is possible to prove that 

_J™ [Et, m ,E^ m ] = E* R;m , for all m = pf. 

1=1, M, l^m 

This means that discrimination is always easier than rejection. 
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